56 research outputs found

    Preserving Randomness for Adaptive Algorithms

    Get PDF
    Suppose Est is a randomized estimation algorithm that uses n random bits and outputs values in R^d. We show how to execute Est on k adaptively chosen inputs using only n + O(k log(d + 1)) random bits instead of the trivial nk (at the cost of mild increases in the error and failure probability). Our algorithm combines a variant of the INW pseudorandom generator [Impagliazzo et al., 1994] with a new scheme for shifting and rounding the outputs of Est. We prove that modifying the outputs of Est is necessary in this setting, and furthermore, our algorithm\u27s randomness complexity is near-optimal in the case d {-1, 1} using O(n log n) * poly(1/theta) queries to F and O(n) random bits (independent of theta), improving previous work by Bshouty et al. [Bshouty et al., 2004]

    A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity

    Full text link
    A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the study of \emph{testable learning}, where the goal is to replace hard-to-verify distributional assumptions (such as Gaussianity) with efficiently testable ones and to require that the learner succeed whenever the unknown distribution passes the corresponding test. In this model, they gave an efficient algorithm for learning halfspaces under testable assumptions that are provably satisfied by Gaussians. In this paper we give a powerful new approach for developing algorithms for testable learning using tools from moment matching and metric distances in probability. We obtain efficient testable learners for any concept class that admits low-degree \emph{sandwiching polynomials}, capturing most important examples for which we have ordinary agnostic learners. We recover the results of Rubinfeld and Vasilyan as a corollary of our techniques while achieving improved, near-optimal sample complexity bounds for a broad range of concept classes and distributions. Surprisingly, we show that the information-theoretic sample complexity of testable learning is tightly characterized by the Rademacher complexity of the concept class, one of the most well-studied measures in statistical learning theory. In particular, uniform convergence is necessary and sufficient for testable learning. This leads to a fundamental separation from (ordinary) distribution-specific agnostic learning, where uniform convergence is sufficient but not necessary.Comment: 34 page

    An Efficient Tester-Learner for Halfspaces

    Full text link
    We give the first efficient algorithm for learning halfspaces in the testable learning model recently defined by Rubinfeld and Vasilyan (2023). In this model, a learner certifies that the accuracy of its output hypothesis is near optimal whenever the training set passes an associated test, and training sets drawn from some target distribution -- e.g., the Gaussian -- must pass the test. This model is more challenging than distribution-specific agnostic or Massart noise models where the learner is allowed to fail arbitrarily if the distributional assumption does not hold. We consider the setting where the target distribution is Gaussian (or more generally any strongly log-concave distribution) in dd dimensions and the noise model is either Massart or adversarial (agnostic). For Massart noise, our tester-learner runs in polynomial time and outputs a hypothesis with (information-theoretically optimal) error opt+ϵ\mathsf{opt} + \epsilon for any strongly log-concave target distribution. For adversarial noise, our tester-learner obtains error O(opt)+ϵO(\mathsf{opt}) + \epsilon in polynomial time when the target distribution is Gaussian; for strongly log-concave distributions, we obtain O~(opt)+ϵ\tilde{O}(\mathsf{opt}) + \epsilon in quasipolynomial time. Prior work on testable learning ignores the labels in the training set and checks that the empirical moments of the covariates are close to the moments of the base distribution. Here we develop new tests of independent interest that make critical use of the labels and combine them with the moment-matching approach of Gollakota et al. (2023). This enables us to simulate a variant of the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using nonconvex SGD but in the testable learning setting.Comment: 26 pages, 3 figures, Version v2: strengthened the agnostic guarante

    List-decoding reed-muller codes over small fields

    Full text link
    We present the first local list-decoding algorithm for the rth order Reed-Muller code RM(r,m) over F2 for r ≥ 2. Given an oracle for a received word R: Fm2 → F2, our random-ized local list-decoding algorithm produces a list containing all degree r polynomials within relative distance (2−r − ε) from R for any ε> 0 in time poly(mr, ε−r). The list size could be exponential in m at radius 2−r, so our bound is op-timal in the local setting. Since RM(r,m) has relative dis-tance 2−r, our algorithm beats the Johnson bound for r ≥ 2. In the setting where we are allowed running-time polyno-mial in the block-length, we show that list-decoding is pos-sible up to even larger radii, beyond the minimum distance. We give a deterministic list-decoder that works at error rate below J(21−r), where J(δ) denotes the Johnson radius for minimum distance δ. This shows that RM(2,m) codes are list-decodable up to radius η for any constant η < 1 2 in time polynomial in the block-length. Over small fields Fq, we present list-decoding algorithms in both the global and local settings that work up to the list-decoding radius. We conjecture that the list-decoding radius approaches the minimum distance (like over F2), and prove this holds true when the degree is divisible by q − 1

    Predicting a Protein's Stability under a Million Mutations

    Full text link
    Stabilizing proteins is a foundational step in protein engineering. However, the evolutionary pressure of all extant proteins makes identifying the scarce number of mutations that will improve thermodynamic stability challenging. Deep learning has recently emerged as a powerful tool for identifying promising mutations. Existing approaches, however, are computationally expensive, as the number of model inferences scales with the number of mutations queried. Our main contribution is a simple, parallel decoding algorithm. Our Mutate Everything is capable of predicting the effect of all single and double mutations in one forward pass. It is even versatile enough to predict higher-order mutations with minimal computational overhead. We build Mutate Everything on top of ESM2 and AlphaFold, neither of which were trained to predict thermodynamic stability. We trained on the Mega-Scale cDNA proteolysis dataset and achieved state-of-the-art performance on single and higher-order mutations on S669, ProTherm, and ProteinGym datasets. Code is available at https://github.com/jozhang97/MutateEverythingComment: NeurIPS 2023. Code available at https://github.com/jozhang97/MutateEverythin

    A complexity theoretic approach to learning

    No full text
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 2002.Includes bibliographical references (leaves 127-138).This thesis details a new vantage point for attacking longstanding problems in machine learning. We use tools from computational complexity theory to make progress on problems from computational learning theory. Our methods yield the fastest and most expressive algorithms to date for learning several fundamental concept classes: * We show that any s-term DNF over n variables can be computed by a polynomial threshold function of order O(n1/3 log s). As an immediate consequence we obtain the fastest known DNF learning algorithm which runs in time 2O(n1/3). * We give the first polynomial time algorithm to learn an intersection of a constant number of halfspaces under the uniform distribution to within any constant error parameter. We also give the first quasipolynomial time algorithm for learning any function of a constant number of halfspaces with polynomial bounded weights under any distribution. * We give an algorithm to learn constant-depth polynomial-size circuits augmented with majority gates under the uniform distribution using random examples only. For circuits which contain a polylogarithmic number of majority gates the algorithm runs in quasipolynomial time. Under a suitable cryptographic assumption we show that these are the most expressive circuits which will admit a non-trivial learning algorithm. Our approach relies heavily on giving novel representations of well known concept classes via complexity theoretic reductions. We exploit the fact that many results in computational learning theory have a complexity theoretic analogue or implication. As such,(cont.) we also obtain new results in computational complexity including (1) a proof that the 30 year old lower bound due to Minsky and Papert [88] on the degree of a perceptron computing a DNF formula is tight and (2) improved constructions of pseudo-random generators, mathematical objects which play a fundamental role in cryptography and derandomization.by Adam Richard Klivans.Ph.D
    corecore